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Abstract 

In this paper, we use beUef-propagation techniques to de- 
velop fast algorithms for image inpainting. Unlike traditional 
gradient-based approaches, which may require many iterations 
to converge, our techniques achieve competitive results af- 
ter only a few iterations. On the other hand, while belief- 
propagation techniques are often unable to deal with high-order 
models due to the explosion in the size of messages, we avoid 
this problem by approximating our high-order prior model us- 
ing a Gaussian mixture. By using such an approximation, we 
are able to inpaint images quickly while at the same time re- 
taining good visual results. 

1 Introduction 

In order to restore a corrupted image, one needs a model of 
how uncorrupted (i.e. natural) images appear. In the Markov 
random field Bay esian paradigm for image restoration [ ], nat- 
ural images are modeled via an image prior. This is a proba- 
bilistic model that encodes how natural images behave locally 
(in the vicinity of every pixel). An inference algorithm is then 
used to restore the image, whose aim is to find a consensus 
among all vicinities on which global solution is most compat- 
ible with a natural image (while still similar to the corrupted 
image — a corrupted tree should be restored as an uncorrupted 
tree, not as an uncorrupted house). The simplest example of 
an image prior is perhaps the pairwise model presented in [ ] 
- which simply expresses that neighboring pixels are likely to 
share similar gray-levels. However, it is easy to see that such 
a model fails to capture a great deal of important information 
about natural images. For example, 'edges' are highly penal- 
ized by such a prior, and it is unable to encode any information 
about 'texture'. Only by using higher-order priors will one be 
able to capture this important information. 

An example of a high-order prior is the field of experts model 
[16], which is parameterized as the product of a selection of fil- 
ters (or 'experts'). Each of these filters is typically a patch of 
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3 X 3 or 5 X 5 pixels, resulting in a 9 or 2 5 -dimensional prior 
respectively. Unfortunately, such a high-dimensional prior lim- 
its the practicality of many inference algorithms. Even though 
it may be possible to use smaller (for example, 2x2) patches 
[ ], we are still limited by the number of gray-levels used to 
properly represent natural images (typically 256). Updating a 
single pixel using a Gibbs sampler (for example) requires us 
to consider all 256 possible gray-levels. Since Gibbs samplers 
may typically take hundreds (or thousands) of iterations to con- 
verge, they are simply impractical in this setting. 

While belief-propagation techniques tend to converge in 
fewer iterations [ ], they are often equally impractical. Since 
adjacent cliques may share as many as 6 nodes (using a 3 x 3 
model), the size of the messages passed between them may be 
as large as 256^. Even if we only use 2x2 cliques, the size 
of our messages may still be as large as 256^, which remains 
impractical for many purposes. Although inpainting has been 
previously approached using belief-propagation techniques in 
[ ], they do not deal with 2 x 2 (or larger) models, and their 
approach can therefore only capture limited textural informa- 
tion. 

To avoid the above problems, image restoration is typically 
performed using gradient- ascent, thereby eliminating the need 
to deal with many discrete gray-levels, and avoiding expensive 
sampling [ ] . While gradient-based approaches are generally 
considered to be fast, they may still require several thousand 
iterations in order to converge, and even then will converge only 
to a local optimum. 

In this paper, we propose a method that gives us the best of 
both worlds: we manage to render belief-propagation practi- 
cal using a high-order (2 x 2) model, and use it for the task of 
image inpainting. By using a nonparametric priori we avoid 
the need to discretize images, resulting in much smaller mes- 
sages being passed between cliques. Our experiments show that 
belief-propagation techniques are able to produce competitive 
results after only a single iteration, rendering them faster than 
many gradient-based approaches, while retaining similar visual 
quality of the restoration. 



^The term 'nonparametric' is something of a misnomer, since the prior is 
actually approximated using a mixture of Gaussians. This term was originally 
used in [ ], and is maintained here for consistency. 
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2 Background 

In this section, we define the Markov random field (MRF) im- 
age prior to be used in our model. Although we shall not 
present any significant results in terms of learning the prior, 
we have nevertheless made a number of modifications to the 
'standard' image prior in order to render inference tractable. 

2.1 The Tield of Experts' Image Prior 

The Hammersley-Clifford theorem states that the joint proba- 
bility distribution of a Markov random field with clique set C 
(assuming maximal cliques) is given by 

cec 

(where Xc is the set of variables in x belonging to the c^^ clique; 
Z is a normalization constant) ["]. When dealing with images, 
the (/)cS are often assumed to be homogeneous [ ], meaning that 
the prior can be defined entirely in terms of a single potential 
function, cf). In the Field of Experts model [ ], this potential 
function is assumed to take the form of a product of experts [ ], 
in which each 'expert' is the response of the image patch {xd 
to a particular filter {J/}. That is, the potential function takes 
the form 

F 

(j){yic] J,(y) = W^ (l)'f(?^c, Jf, OLf) (2) 

(where the a/'s are simply weighting coefficients controlling 
the importances of the filters). Specifically, each expert is as- 
sumed to take the form of a Student's T-distribution, namely 

0^(xe;J/,a/) = (l + i(J/,x,)')-"^ (3) 

Although [16] use contrastive divergence learning to select the 
filters and alphas, it has been shown that the filters can more 
easily be selected using principal component analysis (PCA) 
[15]. This leaves only the problem of learning the alphas, which 
we shall deal with in section 3. 

2.2 Belief-Propagation 

Inference in the MRF setting can be formulated as a message 
passing problem. Two common message passing algorithms 
exist, namely the junction-tree algorithm, and loopy belief- 
propagation [ ]. In our case, which algorithm should be ap- 
plied depends upon the 'shape' of the region being inpainted. 
We will give only a brief overview of these algorithms in order 
to explain why it is infeasible to apply them directly when us- 
ing the above prior. A more complete specification is given in 
[z]; similar ideas are also used in an image inpainting setting in 
[12]. 

Belief-propagation algorithms work by having cliques pass 
'messages' to other cliques which share one or more nodes 



in common. If we denote by Si^j the intersection of the two 
cliques x^ and Xj, and denote by Fx- the neighbors of x^ (i.e. 
those cliques which share one or more nodes with x^), then the 
message, Mi^j, sent from x^ to Xj is given by 

Mi^j{S,,j) = ^ n Mk^,{Sk,^). (4) 

Xi\xj XfcG(rx.\{xj}) 

That is, the outgoing message from x^ to x^ is defined as the 
product of the local potential {^(x^)} with the incoming mes- 
sages from all neighbors except Xj , marginalized over the vari- 
ables in Xi but not in Xj . Once all messages have been sent, the 
final distribution of x^ {/^^(x^)} is given by 

A(x.) = (/>(x,) W Mu^,{Su^,). (5) 

Even when using the 2 x 2 model, evaluating (j){yii) requires us 
to consider 256^ possible gray-level combinations. Although it 
may be possible to approximate the marginal being computed 
in equation (4) without computing ^(x^) explicitly [ ], the 
message itself still contains 256^ elements. This problem is 
dealt with in [ ] by using di factor- graph [ ], which requires 
only that one dimensional marginals are computed; however 
the running time of their method is still linear in the number 
of gray-levels, in addition to the fact that the factor-graph fails 
to fully capture the conditional independencies implied by the 
model. 

As a result, we seek (/) in such a form that the sum in equation 
(4) may be replaced by an integral. In [ ], the authors defined 
such a model in which the potential function takes the form of 
a Gaussian mixture, that is, with (j) taking the form 

N 

(^(xe) = ^/3^e(^^-^^)^^^"'(^^-^^) (6) 

(this is sometimes known as a Gaussian random field [^, '^]). 
Unfortunately, the method they use to learn these mixtures ap- 
pears to be applicable only to low-order models (the largest 
mixture models they learn are 3 -dimensional). 

In the remainder of this paper, we will show that the experts 
{0js} can be approximated as a Gaussian mixture, resulting 
in a high-order model which closely matches the one given in 
equations (2) and (3). We will describe belief-propagation in 
this setting, and show how this approach can be used for fast 
image inpainting. 

3 Approximating the Prior 

In [15], the authors showed that the filters { J/s} in equation (2) 
can be learned by performing a PCA on a collection of natural 
image patches. Here we follow this idea. To learn our filters 
(for a 2 X 2 model), we randomly cropped 50,000 2x2 patches 
from images in the Berkeley Segmentation Database [ ], and 
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Figure 1: The three filters used in our 2x2 model. 



used their principal components as our filters. It was found in 
[16] that the first component of such a PC A always corresponds 
to a uniform gray patch, which should be ignored in order to 
obtain a model invariant to intensity, hence we only use the last 
three filters for our 2x2 model. The resulting patches appear to 
make sense visually, and are shown in figure 1 . This technique 
can also be used to learn 3 x 3 or 5 x 5 models (resulting in 8 
or 24 filters, respectively), which are not shown. 

This model requires also that we learn the 'importances' 
{oLf} of each filter. From equation (3), it can be seen that the 
Qf/s simply control the shape (or 'peakedness') of the Student's 
T-distribution. Rather than try to learn the a/s explicitly, we 
will learn them implicitly through our approximation. 

In order to approximate the experts {^j}, we first calculated 
the inner products {(J/, Xc)} for a random selection of 5,000 
image patches (again cropped from the Berkeley segmentation 
database [ ]). A normal probability plot of the data (against 
the first filter) is shown in figure 2 - this plot reveals that the 
data is more heavily tailed than would be suggested by a normal 
distribution, indicating that the Student's T-distribution may in- 
deed be valid. However, rather than assume that this data is 
generated according to a Student's T-distribution, we simply 
tried to approximate it directly using a mixture of Gaussians. 

In order to estimate the distribution governing this data, we 
used the expectation-maximization (EM) algorithm [ ], assum- 
ing that the set of inner products for each filter was generated 
by a mixture of three Gaussians. All of our parameters to be 
learned {6 = (/?, /i, a)} were initialized by using a K-means 
clustering [13] on the original inner products. We used this ap- 
proach to learn a separate mixture model for each expert. This 
algorithm produces an approximation of the form 
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0^(xe;e,J)c^^/?/ 
1 



^exp 



(7) 



The alpha terms are no longer relevant - the 'shape' of the dis- 
tribution is implicitly controlled by the other parameters. How- 
ever, the expression in equation (7) is not yet in the same form 
as equation (6). Hence we need to solve the system 



exp 



((J,x)-/i)- 

2(72 



exp((x-/i)^S-i(x-/i)). (8) 



That is, we are trying to solve for H ^ (a matrix) and ji (a vec- 
tor), in terms of J (a vector) and ji (a scalar). It is not difficult 
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Figure 2: Solid line: A normal probability plot of the inner 
products (horizontal axis) against their corresponding normal 
probabilities (vertical axis). This plot shows that the inner prod- 
ucts are more heavily tailed than would be suggested by a nor- 
mal distribution (dotted line). 



to see that the only solution for S ^ is 
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(9) 

(where n is the size of the filter J - in our case, n = 4). Alter- 
nately, there are infinite solutions for /i. One obvious solution 
is 

Ai= : . (10) 

However, we found for all of our filters that YTi^x^i — 0' 
meaning that this solution would be highly unstable. A more 
stable solution (which we used) is given by 
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Our potential function is now of the form 



(11) 



(xc; = n X] ^/'^ 
/=i 



. (12) 



In order to expand the above product, we use the following re- 
sult about the product of Gaussian distributions [ ]: given K 
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Gaussians (with means /ii . . . fix, and covariances Si . . . H^), 
the covariance of the product {S^} is given by 

E' = (^S-i)-i (13) 

(although each S"^ is singular in our case, their sum is not). 
The mean of the product {/i^} is given by 

K 

M' = S'(^E-Vi)- (14) 

The corresponding beta term for the product is just (3' = 
Y\f=i Pk- In our case this results in a final approximation which 
is a mixture of 3^ = 27 Gaussians. 

4 Inference 

In order to perform belief-propagation, we must first be able 
to express equations (4) and (5) in terms of the Gaussian mix- 
tures we have defined. In our setting, the sum in equation (4) 
becomes an integral, resulting in the new equation 




We have already suggested how to perform the above multipli- 
cation in equations (13) and (14). The only difference in this 
case is that the mixtures for each message may contain fewer 
variables (and smaller covariance matrices) than the local dis- 
tribution {(j){-Ki)}. In such a case, the inverse covariance matri- 
ces for each message {S^^s} are simply assumed to be zero in 
all missing variables. 

To compute the marginal distribution of a Gaussian mixture 
with mean /i and covariance matrix S (i.e. the integral in equa- 
tion (15)), we simply take the elements of /i and S correspond- 
ing to the variables whose marginals we want. The importances 
for each Gaussian in the mixture remain the same. 

Of course, when we compute the products in equation (15), 
we produce a model with an exponentially increasing number 
of Gaussians. As a simple solution to this problem, we restrict 
the maximum number of Gaussians to a certain limit (see sec- 
tion 5), by including only those with the highest importances. 

When solving an inpainting problem, we only wish to treat 
some of the variables in each clique as unknowns (for example, 
the 'scratched' sections). Hence the potential function for these 
cliques should be conditioned upon the 'observed' regions of 
the image. Suppose that for a clique c we have unknowns X(^) , 
and observed variables X(o) (i.e. Xc = (x^^^; x^^^)^). Then we 
may partition the mean and covariance matrix (for a particular 
Gaussian in the mixture) as 

= ( y ) (16) 



and 

S= ^("'"^ . (17) 

The mean of the conditional distribution {n(u o)} is now given 
by 

and the covariance matrix {11(^.0)} is given by 

Finally, once all messages have been propagated, we are able 
to compute the marginal distribution for a given node (or pixel, 
belonging to clique c) by marginalizing Dc{^c) (equation (5)) 
in terms of that node. In order to estimate the 'most likely' 
configuration for this pixel, we simply consider each of the 256 
possible gray-levels.^ 

4.1 Propagation Methods 

As we mentioned in section 2.2, the two propagation techniques 
we will deal with are the junction-tree algorithm and loopy 
belief-propagation. Although we will not cover these in great 
detail (see [ ] for a more complete exposition), we will explain 
the differences between the two in terms of image inpainting. 

Both algorithms work by passing messages between those 
cliques with non-empty intersection. However, when using 
the junction-tree algorithm, we connect only enough cliques to 
form a maximal spanning tree. Now, suppose that two cliques 
Ca and C5 have intersection Sa,b- If each clique along the path 
between them also contains Sa,b^ we say that this spanning 
tree obeys the 'junction- tree property'. If this property holds, 
it can be proven that exact inference is possible (subject only 
to the approximations used by our Gaussian model), and re- 
quires that messages be passed only for a single iteration [ ]. 
Technically, the graphs which obey this property are the so- 
csilledtriangulated or chordal graphs. The tractability of exact 
inference in these graphs depends on their tree-width: graphs 
that are more 'tree-like' are better suited to efficient exact in- 
ference. See [^] for details. 

If this property doesn't hold, then we may resort to using 
loopy belief-propagation, in which case we simply connect all 
cliques with non-empty intersection. There is no longer any 
message passing order for which equation (15) is well defined 
(i.e. we must have a criterion to initialize some messages, and 
the common choice is to assume they have a uniform distribu- 
tion), meaning that messages must be passed for many itera- 
tions in the hope that they will converge. 

^Although this final step may appear to make the running time of our solu- 
tion linear in the number of gray-levels, it should be noted that this step needs 
to be performed only once, after the final iteration. It should also be noted that 
this estimate only requires us to measure the response of a one-dimensional 
Gaussian, which is inexpensive. More sophisticated mode-finding techniques 
exist [ ], which we considered to be unnecessary in this case. Finally, note that 
this step is not required when our mixture contains only a single Gaussian, in 
which case we simply select the mean. 
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Figure 3: The graph formed from the white pixels in the left 
image forms a junction-tree (assuming a 2 x 2 model). The 
graphs formed from the white pixels in the other two images 
do not. 

Figure 3 shows an inpainting problem for which a junction- 
tree exists, and two problems for which one does not (assuming 
2 X 2 -pixel cliques). Since the regions being inpainted are usu- 
ally thin lines (or 'scratches'), we may often observe graphs 
which do in fact obey the junction- tree property in practice. 

Fortunately, we found that even in those cases where no 
junction-tree existed, loopy belief-propagation tended to con- 
verge in very few iterations. Although there are few theoretical 
results to justify this behavior, loopy-belief propagation typi- 
cally converges quickly in those cases where the graph almost 
forms a tree (as is usually the case for the regions being in- 
painted). 

5 Experimental Results 

In order to perform image inpainting, we used a high-level 
(Python) implementation of the junction-tree algorithm and 
loopy belief-propagation, which is capable of constructing 
Markov random fields with any topology. Despite being written 
in a high-level language, our implementation is able to inpaint 
images within a reasonably short period of time. Since it is 
difficult to assess the quality of our results visually, we have 
reported both the peak signal-to-noise ratio (PSNR), and the 
structured similarity (SSIM) [19]. 

We were not able to directly compare our PSNR results to 
those in [ ], since they only presented 3x3 and 5x5 mod- 
els. While it is certainly true that their 3x3 model produces a 
much higher PSNR than our technique (e.g. a PSNR of ~31.4 
for the image in figure 4), its execution time is simply imprac- 
tical. Fortunately, it is still possible to measure approximate 
execution times of a 2 x 2 model using their approach, even 
without reporting PSNRs. These results are presented in the 
next section. 

While it is true that the difference between the two models 
being compared makes meaningful comparison difficult, it is 
most important to note that there is little visual difference be- 
tween the two models. In the next section, we will show that 
our 2x2 model is faster than a similar model using gradient- 
ascent - it is the combination of these two results which we 
believe makes our technique viable. 




Figure 4: Above, top-left to bottom-right: the original image; 
the image containing the text to be removed; inpainting after a 
single iteration, using a single Gaussian (PSNR = 22.74, SSIM 
= 0.962); inpainting after two iterations (PSNR = 22.82, SSIM 
= 0.962). Below: close-ups of all images. 

Figure 4 shows a corrupted image from which we want to 
remove the text. The image has been inpainted using a model 
containing only a single Gaussian (although the learned mix- 
tures contained three Gaussians - see below). After a single 
iteration, most of the text has been removed, and after two it- 
erations it is almost completely gone. Although the current 
state-of-the-art inpainting techniques produce superior results 
in terms of PSNR [16], they give similar visual results and 
take several thousand iterations to converge, compared to ours 
which takes only two (no further improvement was observed 
after a third iteration). 

Figure 5 compares models of various sizes, varying both the 
number of Gaussians used to approximate each mixture, as well 
as the maximum number of Gaussians allowed during the infer- 
ence stage. The same results are summarized in table 1 . 

The top-right image in figure 5 was produced using a model 
in which each expert was approximated using three Gaussians, 
yet only one Gaussian was allowed during propagation. In con- 
trast, the model used to produce the top-left image was ap- 
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Figure 5: Above: the original image; the corrupted image con- 
taining 'scratches'. Below, top-left to bottom-right: mixture 
contains 1 Gaussian; mixture contains 3 Gaussians, propaga- 
tion is performed with 1 Gaussian; propagation is performed 
with 3 Gaussians; propagation is performed with 9 Gaussians. 
All results are shown using the 2 x 2 model, after three itera- 
tions. See table 1 for more detail. 

proximated using only a single Gaussian. Interestingly, the for- 
mer model actually outperformed the latter in this experiment. 
While this result may seem surprising, it may be explainable as 
follows: in the single-Gaussian model, the standard deviation 
is overestimated in order to compensate for the high kurtosis 
of the training data [ ]. However, in the model containing three 
Gaussians, the most significant Gaussian (i.e. the Gaussian with 
the highest (3 term) captures only the most of the 'important' in- 
formation about the distribution, and ignoring the other two is 
not very harmful. 

Furthermore, given that increasing the maximum number of 
Gaussians allowed during propagation does not seem to signif- 
icantly improve inpainting performance, we suggest that this 
single-Gaussian model may be the most practical. Even after 
only a single iteration, the results are visually pleasing. 

Finally, although we mentioned that it is easy to learn 3x3 
or larger models using the proposed method, such models were 
found to be impractical in an inference setting. For example, 
a 3 X 3 model would have a total of 8 filters, resulting in a 
mixture model with 3^ Gaussians. Although this problem may 
again be addressed by simply including only the most important 
Gaussians, this results in a very high error in the approximation; 
we found that this model did not outperform the 2x2 version 
in practice. 
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Table 1: Comparison of inpainting performance for several 
models. Here we vary the number of Gaussians used to com- 
pute the initial mixture, as well as the maximum number of 
Gaussians allowed during propagation. 



Figure 6: Two equally large regions to be inpainted, in two 
differently sized images. 

5.1 Execution Times 

Unfortunately, it proved very difficult to compare the execution 
times of our model with existing gradient- ascent techniques. 
For example, the inpainting algorithm used in [' ^] computes 
the gradient for all pixels using a 2-dimensional matrix con- 
volution over the entire image, and then selects only the re- 
gion corresponding to the inpainting mask. While this results 
in very fast performance when a reasonable proportion of an 
image is being inpainted, it results in very slow performance 
when the inpainting region is very sparse (as is often the case 
with scratches). It is easy to produce results which favor either 
algorithm, but such a comparison will likely be unfair. 

To make explicit this difficulty, consider the images in figure 
6. The image on the left is significantly larger than the im- 
age on the right, yet the corrupted regions are of the same size 
(~1500 pixels). As a result, our algorithm exhibited the same 
running time on both images, whereas the gradient- ascent algo- 
rithm from [ ] was approximately 6 times slower on the larger 
image. 
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n 


Multiplications 


Inverses 


1 


14800 


44648 


2 


42072 


37386 


3 


25760 


12880 


4 


43308 


21654 



Table 2: Number of operations required by our algorithm. Mul- 
tiplications are of (n x n) x (n x 1) matrices, and inverses are 
of (n X n) matrices, for various values of n. 



As a more representative example, when inpainting the im- 
age in figure 5 (using a single Gaussian), the first iteration took 
~33.6 seconds on our test machine. The second iteration took 
~39.0 (as did subsequent iterations - the first is slightly faster 
due to many messages being empty at this stage). The running 
time of this algorithm increases linearly with the number of 
Gaussians (for example, when using three Gaussians, the first 
iteration took ~87.0 seconds). 

Alternately, a single iteration of inpainting using the 
gradient-ascent algorithm from [ ] took ~0.1 seconds (using 
a 2 X 2 model). However, their code was run for 2,500 itera- 
tions, meaning that our code is still in the order of 2 to 3 times 
faster. This is a pleasing result, given that we used a high-level 
language for our implementation. 

However, in an attempt to provide a more 'fair' comparison, 
we have tried to analyze the computations required by both al- 
gorithms. It can be seen from the equations presented in section 
4 that our algorithm consists (almost) entirely of matrix multi- 
plications and inverses.^ Although it is very difficult to express 
exactly the number of such operations required by our algo- 
rithm in general, we have calculated this number for a specific 
case. 

The corrupted image in figure 5 requires us to inpaint a to- 
tal of 5829 pixels. The number of operations required by our 
algorithm to inpaint this image (during the second iteration) is 
shown in table 2. 

Alternately, the gradient-ascent approach in [ ] is domi- 
nated by the time taken to compute the inner products in (the 
derivative of the logarithm of) equation (3). Each pixel is con- 
tained by four cliques, and we must compute the inner product 
against each of our three filters. Therefore we must compute a 
total of 4 X 3 X 5829 = 69948 inner products per iteration. 

As a simple experiment, we timed these operations in Matlab 
(using random matrices and vectors). We found that computing 
69948 inner products was approximately 10 times faster than 
computing the matrix operations shown in table 2. This leads 
us to believe that a low-level implementation of our belief- 
propagation algorithms may be significantly faster even than 
the results we have shown. 



^ Other operations, such as additions and permutations are typically much 
faster than these. 



6 Discussion 

Our results have shown than even a 2 x 2 model is able to pro- 
duce very satisfactory inpainting performance. We believe that 
even this small model is able to capture much of the important 
information about natural images. While higher-order models 
exist [ ], the improvements appear to be quite incremental, 
despite a significant increase in their execution time. While it 
is certainly the case that our results fall short of the state-of-the- 
art in terms of PSNR, the differences are difficult to distinguish 
visually. It is therefore pleasing that we are able to produce 
competitive results within only a short period of time. 

We have not yet fully explored the possibility of using the 
junction-tree algorithm to inpaint images. Unfortunately, de- 
termining whether a graph obeys the junction-tree property (see 
section 4.1) is very expensive, meaning we simply used loopy 
belief-propagation in all cases, without even performing this 
test. However, there are many cases in which we can be sure 
that a junction-tree exists - for example, if the inpainting re- 
gion is a scratch which is only one or two pixels wide. In such 
cases, optimal results can be produced after only a single it- 
eration, which would render our algorithm several times faster 
again. 

In spite of this, we found that loopy belief-propagation 
tended to converge in very few iterations. While we believe 
it helped that the regions we are inpainting appear to be fairly 
'tree-like', there is very little theory to support this claim. On 
the other hand, loopy belief-propagation often converges far 
slower when dealing with large regions, meaning that we can 
inpaint a 'scratch' much faster than a 'coffee stain'. 

We have also not considered the possibility that the corrupted 
pixels may contain some information about the original image. 
Many gradient- ascent approaches implicitly exploit this possi- 
bility by initializing their algorithms using the corrupted pixels. 
If the restored image is 'close to' the corrupted image, this can 
result in faster convergence. Our approach is also able to deal 
with this possibility by augmenting the graphical model with 
an observation layer with the respective noise model for the 
damaged pixels. 



7 Conclusion 

In this paper, we have developed a model for inpainting im- 
ages quickly using belief-propagation. While image inpaint- 
ing has previously been performed using low-order models by 
belief-propagation, and high-order models by gradient-ascent, 
we have presented new methods which manage to exploit the 
benefits of both, while avoiding their shortcomings. We have 
shown these algorithms to give satisfactory visual results and to 
be faster than existing gradient-based techniques, even in spite 
of our high-level implementation. 
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